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Abstract 



^> Variable selection is a difficult problem that is particularly challenging in the 

I ; analysis of high-dimensional genomic data. Here, we introduce the CAR score, a 

Cd novel and highly effective criterion for variable ranking in linear regression based 

CZJ on Mahalanobis-decorrelation of the explanatory variables. The CAR score pro- 

vides a canonical ordering that encourages grouping of correlated predictors and 
\^ down- weights antagonistic variables. It decomposes the proportion of variance ex- 

J> plained and it is an intermediate between marginal correlation and the standardized 

^^ regression coefficient. As a population quantity, any preferred inference scheme 

can be applied for its estimation. Using simulations we demonstrate that variable 
selection by CAR scores is very effective and yields prediction errors and true and 
false positive rates that compare favorably with modern regression techniques such 
as elastic net and boosting. We illustrate our approach by analyzing data concerned 
^^ with diabetes progression and with the effect of aging on gene expression in the 

human brain. The R package "care" implementing CAR score regression is available 
from CRAN. 
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1 Introduction 



Variable selection in the linear model is a classic statistical problem (George 2000| ). The 
last decade with its immense technological advances especially in the life sciences has 
revitalized interest in model selection in the context of the analysis of high-dimensional 



data sets (Fan and Lv 20101. In particular, the advent of large-scale genomic data sets 
has greatly stimulated the development of novel techniques for regularized inference 
from small samples (e.g. Hastie et al.[ 2009[|. 



Correspondingly, many regularized regression approaches that automatically per- 
form model selection have been introduced with great success, such as least angle 
regression ( Efron et al. 2004[ |, elastic net ( Zou and Hastie 2005[ l, the structured elastic 
net ([Li and LiH2008| ), OSCAR ( [Bondell and Reichf pOOSy, the Bayesian elastic net ('U] 
and Lin 20101, and the random lasso ( [Wang et aL|[2bll| ). By construction, in all these 
methods variable selection is tightly linked with a specific inference procedure, typically 
of Bayesian flavor or using a variant of penalized maximum likelihood. 

Here, we offer an alternative view on model selection in the linear model that 
operates on the population level and is not tied to a particular estimation paradigm. 
We suggest that variable ranking, aggregation and selection in the linear model is best 
understood and conducted on the level of standardized, Mahalanobis-decorrelated 
predictors. Specifically, we propose CAR scores, defined as the marginal correlations 
adjusted for correlation among explanatory variables, as a natural variable importance 
criterion. This quantity emerges from a predictive view of the linear model and leads 
to a simple additive decomposition of the proportion of explained variance and to a 
canonical ordering of the explanatory variables. By comparison of CAR scores with 
various other variable selection and regression approaches, including elastic net, lasso 
and boosting, we show that CAR scores, despite their simplicity, are capable of effective 
model selection both in small and in large sample situations. 

The remainder of the paper is organized as follows. First, we revisit the linear model 
from a predictive population-based view and briefly review standard variable selection 
criteria. Next, we introduce the CAR score and discuss its theoretical properties. Finally, 
we conduct extensive computer simulations as w^ell as data analysis to investigate the 
practical performance of CAR scores in high-dimensional regression. 



2 Linear model revisited 

In the following, we recollect basic properties of the linear regression model from the 



perspective of the best linear predictor (e.g. Whittaker 1990 Chapter 5). 



2.1 Setup and notation 

We are interested in modeling the linear relationship between a metric univariate re- 
sponse variable Y and a vector of predictors X = (Xi, . . . , X^Y . We treat both Y and 
X as random variables, with means E(y) = jiy arid E(X) = ^ and (co)-variances 



Var(y) = C72, Var(X) = E, and Cov(y,X) = Zyx = E ((Y - ^y)(X -;/)^) = L^y. 
The matrix E has dimension d x d and Eyx is of size 1 x d. With P (= capital "rho") 
and Pyx ^ve denote the correlations among predictors and the marginal correlations 
between response and predictors, respectively. With V = diag{Var(Xi), . . . , Var(X^j)} 
we decompose E = y^/^py 1/2 ^nd Eyx = p-yPyxV^''^. 

2.2 Best linear predictor 

The best linear predictor of Y is the linear combination of the explanatory variables 

y* = fl + h^x (1) 

that minimizes the mean squared prediction error E((Y — Y*)^). This is achieved for 
regression coefficients 

h = E-l Exy (2) 

and intercept 

a = jiY — b^}i ■ (3) 

The coefficients a and b = {bi, . . . ,b^Y are constants, and not random variables like X, Y 
and y*. The resulting minimal prediction error is 

E((y-Y*)2) =a^-b^zb. 

Alternatively, the irreducible error may be written E ((Y — Y*)'^) = c^y (1 — Q^) where 
Q = Corr(Y,Y*)and 

Q2 = PyxP^Pxy 

is the squared multiple correlation coefficient. Furthermore, Cov(Y, Y*) = ay Ci^ and 

E(Y*) = fly. The expectation E ((Y — Y*)^) = Var(Y — Y*) is also called the unexplained 
variance or noise variance. Together with the explained variance or signal variance Var( Y*) = 
C7"y Q-2 it adds up to the total variance Var( Y) = tJy. Accordingly, the proportion of explained 
variance is 

q2. 



Var(Y) 

which indicates that Q^ is the central quantity for understanding both nominal prediction 

error and variance decomposition in the linear model. The ratio of signal variance to noise 

variance is 

Var(Y*) _ 02 

Var(Y - Y*) ~ 1 - Q2 ' 

A summary of these relations is given in Tab. IT] along with the empirical error decompo- 
sition in terms of observed sum of squares. 

If instead of the optimal parameters a and h we employ a' = a + Afl and b' = b + Ab 
the minimal mean squared prediction error E ((Y — Y*)^) increases by the model error 

ME{Aa, Ab) = (Ab)^ E Ab + (Afl)^ . 

The relative model error is the ratio of the model error and the irreducible error E ((Y — Y*)^) 



Table 1: Variance decomposition in terms of squared multiple correlation Q^ and corre- 
sponding empirical sums of squares. 



Level 


Total variance 


= unexplained variance 


+ 


explained variance 


Population 


Var(y) 


= Var(y - y*) 


+ 

+ 


Var(y*) 
(72 Q2 


Empirical 


TSS 

ZU{y-y? 

d.f. = n - 1 


= RSS 

= ZUiyr-yr? 

d.f. = n-d-l 


+ 
+ 


ESS 

ZUiyi-y? 

d.f. = d 



Abbreviations: y = \ YL^^i yi) d.f: degrees of freedom; TSS: total sum of squares; RSS: 
residual sum of squares; ESS: explained sum of squares. 



2.3 Standardized regression equation 

Often, it is convenient to center and standardize the response and the predictor variables. 
With ygtd = (y — f/y)/c7"y and Xgjd = V^^''^{X — fi) the predictor equation (Eq.jljl can 
be written as 

y:,d = {y*-¥r)/<^y = bLxsid (4) 

where 

&std = V'/^ba-' = P 'PxY (5) 

are the standardized regression coefficients. The standardized intercept flgtd = vanishes 
because of the centering. 

2.4 Estimation of regression coefficients 

In practice, the parameters a and b are unknown. Therefore, to predict the response y 

for data x using y = a + b x w^e have to learn a and b from some training data. In our 
notation the observations Xj with / G {1, . . . , n} correspond to the random variable X, y, 
to y, and y, to y*. 

For estimation we distinguish between two main scenarios. In the large sample 
case with n ^ d we simply replace in Eq. |2] and Eq. [s] the means and covariances 
by their empirical estimates pty, fi, t, = S, Exy = Sxy, etc. This gives the standard 
(and asymptotically optimal) ordinary least squares (OLS) estimates boLS = S^^ Sxy 

and floLS = py — ^ols F- Similarly, the coefficient of determination R^ = 1 — ^ is 
the empirical estimate of Q^ (cf. Tab. m). If unbiased variance estimates are used the 

adjusted coefficient of determination R^^- = 1 TSS/fn-i) ^^ obtained as an alternative 

estimate of Q^. For data X and Y normally distributed it is also possible to derive exact 
distributions of the estimated quantities. For example, the null density of the empirical 
squared multiple correlation coefficient Q2 = r2 ^g jr^^i-^ ^ ^^^.^ f^2. d^ iLi|zil'\ . 



Conversely, in a "small n, large d" setting we use regularized estimates of the covariance 
matrices E and Exr- For example, using James-Stein-type shrinkage estimation leads to 
the regression approach of Opgen-Rhein and Strimmer ( 2007||, and employing pen alized 
maximum likelihood inference results in scout regression ( Witten and Tibshirani 2009| , 
which depending on the choice of penalty includes elastic net ( |Zou and Hastie |2005| l 
and lasso ([Tibshiram 1996[> as special cases. 



3 Measuring variable importance 



Variable importance may be defined in many different w^ays, see Firth ( 1998 1 for an 
overview. Here, we consider a variable to be "important" if it is informative about 
the response and thus if its inclusion in the predictor increases the explained variance 
or, equivalently, reduces the prediction error. To quantify the importance ^(X,) of the 
explanatory variables Xj a large number of criteria have been suggested ( Gromping 
2007| . Desired properties of such a measure include that it decomposes the multiple 
correlation coefficient E,=i ^(Xj) = Q^, that each (p{Xj) > is non-negative, and that 
the decomposition respects orthogonal subgroups ( Genizi[[l993| . The latter implies for a 
correlation matrix P with block structure that the sum of the (p{Xj) of all variables Xj 
within a block is equal to the squared multiple correlation coefficient of that block with 
the response. 



3.1 Marginal correlation 

If there is no correlation among predictors (i.e. if P = I) then there is general agreement that 
the marginal correlations Pxy = {pi, ■ ■ ■ , PdY provide an optimal way to rank variable 
(e.g. Fan and Lv 20081. In this special case the predictor equation (Eq.llll simplifies to 



X 



std 



-^XY^std ■ 



For P = I the marginal correlations represent the influence of each standardized co- 
variate in predicting the standardized response. Moreover, in this case the sum of the 
squared marginal correlations Q^ = E;=i p^ equals the squared multiple correlation 
coefficient. Thus, the contribution of each variable Xj to reducing relative prediction 
error is p? — recall from Tab. pi that Var(y — Y*) /Cy = 1 — Q-^. For this reason in the 
uncorrelated setting 

cp-^--"{Xj)=pj 

is justifiably the canonical measure of variable importance for Xj. 

However, for general P, i.e. in the presence of correlation among predictors, the 
squared marginal correlations do not provide a decomposition of Q^ as PxyPxY 7^ ^^• 
Thus, they are not suited as a general variable importance criterion. 



3.2 Standardized regression coefficients 

From Eq.lllone may consider standardized regression coefficients bgtd (Eq.jsll as general- 
ization of marginal correlations to the case of correlation among predictors. However, 
while the bgtd properly reduce to marginal correlations for P = I the standardized regres- 
sion coefficients also do not lead to a decomposition of Q^ as bstd^std = PyxP ^Pxy 7^ 
n^. Further objections to using bgtd ^s a measure of variable importance are discussed in 
Bring! ( |T994l >. 



3.3 Partial correlation 

Another common way to rank predictor variables and to assign p-values is by means of 
t-scores Txy = (^1/ • • • / Trf)^ (which in some texts are also called standardized regression 
coefficients even though they are not to be confused with bgtd)- The i-scores are directly 
computed from regression coefficients via 

Txy = diag{p-i}-i/2b3td(l- 

= diag{E-i}-i/2b (7-1(1 -Q^ 

The constant d.f. is the degree of freedom and diag{M} the matrix M with its off- 
diagonal entries set to zero. 

Completely equivalent to f-scores in terms of variable ranking are the partial correla- 
tions PxY = (pi, . . . ,pdY between the response Y and predictor Xj conditioned on all 
the remaining predictors X^y. The f -scores can be converted to partial correlations using 
the relationship 




P^ = T^/^Tf + d.i.. 

Interestingly, the value of d.f. specified in the i-scores cancels out when computing pj. 
An alternative but equivalent route to obtain the partial correlations is by inversion and 
subsequent standardization of the joined correlation matrix of Y and X (e.g. Opgen-] 
[Rhein and Strimmer||2007| |. 

The p-values computed in many statistical software packages for each variable in a 
linear model are based on empirical estimates of Txy ^vith d.f. = n — d — \. Assuming 
normal X and Y the null distribution of fy is Student t with n — d — 1 degrees of freedom. 
Exactly the same p-values are obtained from the empirical partial correlations fj w^hich 

have null-density /(fy) = |rj|Beta y}}\,^^) withK = d.f. + l = n — dand Var(fy) = ^. 

Despite being widely used, a key problem of partial correlations Pxy (and hence 
also of the corresponding f-scores) for use in variable ranking and assigning variable 
importance is that in the case of vanishing correlation P = I they do not properly reduce 
to the marginal correlations Pxy- This can be seen already from the simple case with 
three variables Y, Xj, and X2 with partial correlation 



Py, Xi\X2 



PY,Xi — PY,X2PXi,X2 



Py,x2\/'^ Pxi,x2 



which for pxi,X2 = is not identical to pr^Xi unless /Oy,x2 also vanishes. 
3.4 Hoffman-Pratt product measure 



First suggested by Hoffman ([1960[l and later defended by Pratt ( 1987[| is the following 



alternative measure of variable importance 

By construction, Ylj=,i (p^^i-^j) — ^^' ^^^ if correlation among predictors is zero then 
(p^^{Xj) = p? Moreover, the Hoffman-Pratt measure satisfies the orthogonal compati- 
bility criterion (Genizi 1993[|. 



However, in addition to these desirable properties the Hoffman-Pratt variable im- 
portance measure also exhibits two severe defects. First, (p^^{Xj) may become negative, 
and second the relationship of the Hoffman-Pratt measure with the original predictor 
equation is unclear. Therefore, the use of (p^^{Xj) is discouraged by most authors (cf. 
Grompingt[2007l l. 



3.5 Genizi's measure 

More recently, Genizi ( 1993| proposed the variable importance measure 

/c=l ^ ^ 

Here and in the following P^/^ is the uniquely defined matrix square root with P^^^ 
symmetric and positive definite. 

Genizi's measure provides a decomposition J^j^iCp'^iXj) = Q^, reduces to the 
squared marginal correlations in case of no correlation, and obeys the orthogonality 
criterion. In contrast to (p^^{Xj) the Genizi measure is by construction also non-negative, 
cp'^iXj) > 0. 

However, like the Hoffman-Pratt measure the connection of (p^{Xj) with the original 
predictor equations is unclear. 

4 Variable selection using CAR scores 

In this section we introduce CAR scores io = {coi, . . . , oOdY and the associated variable 
importance measure (p'^^^{Xj) = cvj and discuss their use in variable selection. 

Specifically, we argue that CAR scores (v and (p^^^{Xj) naturally generalize marginal 
correlations Pxy = {pi, ■ ■ . ,|Orf)^ and the importance measure (p^^^°^^{Xj) = pj to settings 
with non-vanishing correlation P among explanatory variables. 



Table 2: Relationship between CAR scores cv and common quantities from the linear 
model. 

Criterion Relationship with CAR scores cv 

Regression coefficient h = "L^^^^w ay ^ co = L^'^^b dy^ 

Standardized regression coeff. bgtd = ^^^^'^ ^ ^ = P^^^^std 

Marginal correlation Pxy = P^^^co ^ w = P^'^Pxy 

Regression i-score Xxy = (Pdiag{P^^})^^''^a; (1 — io'^w)^^^'^Vd.i. 



4.1 Definition of the CAR score 

The CAR scores cv are defined as 

CV=P-^^^PXY, (6) 

i.e. as the marginal correlations Pxy adjusted by the factor P^^'^. Accordingly, the 
acronym "CAR" is an abbreviation for Correlation- Adjusted (marginal) coRrelation. The 
CAR scores co are constant population quantities and not random variables. 

Tab. |2] summarizes some connections of CAR scores with various other quantities 
from the linear model. For instance, CAR scores may be viewed as intermediates 
between marginal correlations and standardized regression coefficients. If correlation 
among predictors vanishes the CAR scores become identical to the marginal correlations. 

The CAR score is a relative of the CAT score (i.e. correlation-adjusted i-score) that 
we have introduced previously as variable ranking statistic for classification problems 
(Zuber and Strimmer 2009[|. In Tab. [slwe review some properties of the CAT score in 



comparison with the CAR score. In particular, in the CAR score the marginal correlations 
Pxy play the same role as the f-scores r in the CAT score. 

4.2 Estimation of CAR scores 

In order to obtain estimates to of the CAR scores we substitute in Eq.|6]suitable estimates 
of the two matrices P^^/-^ and Pxy- For large sample sizes n ^ d we suggest using 
empir ical and for small sample size shrinkage estimators, e.g. as injSchafer and Strimmer 



(2005 K An efficient algorithm for calculating the inverse matrix square-root R ' for 



the shrinkage correlation estimator is described in Zuber and Strimmer (2009[|. If the 



correlation matrix exhibits a known pattern, e.g., a block-diagonal structure, then it is 
advantageous to employ a correspondingly structured estimator. 

The null distribution of the empirical CAR scores under normality is identical to 
that of the empirical marginal correlations. Therefore, regardless of the value of P the 

null-density is /(a>;) = |a)y|Beta ( a>?; \, ^ 1 with k = n — 1. 



Table 3: Comparison of CAT and CAR scores. 





CAT 


CAR 


Response Y 


Binary 


Metric 


Definition 


T-adj ^ p-1/2^ 


cv = P^'^PxY 


Marginal quantity 


r = {i, + ^X'^'v-'^H^^-^2) 


PXY 


Decomposition 


Hotelling's T^ 


Squared multiple correlation 




T' = LU^;'r 


^' = LMCvf 


Global test statistic 






for a set of size s 


T.^=E;Ll(^f^)^ 


Rl = LU^j 


Null distribution for 






empirical statistic 


TH''i^)-^Hs.m-s + l) 


R2^Beta(f,^^) 


under normality 


with m = ni + n2 — 2 





4.3 Best predictor in terms of CAR scores 

Using CAR scores the best linear predictor (Eq. [Ill can be written in the simple form 



Y:,^ = cv^S{X) = J^cvjSj{X), 

7=1 



w^here 



S{X) 



>-1/2t/-1/2/ 



p-wZy-UZ^X -}l)= P^'^^Xstd 



)-l/2. 



(7) 



(8) 



are the Mahalanobis-decorrelated and standardized predictors with Var(J(X)) = I. 
Thus, the CAR scores to are the weights that describe the influence of each decorrelated 
variable in predicting the standardized response. Furthermore, with Corr(Xstd, Y) = 
PxY 'we have 

a; = Corr(<J(X),Y), 

i.e. CAR scores are the correlations between the response and the decorrelated covariates. 

4.4 Special properties of the Mahalanobis transform 

The computation of CAR score relies on decorrelation of predictors using Eq.lS] which 
is known as the Mahalanobis transform. Importantly, the Mahalanobis transform has 
a number of properties not shared by other decorrelation transforms with Var(J(X) ) = I. 
First, it is the unique linear transformation that minimizes E ((^(X) — Xstd)^(<^(X) — Xgtd))/ 
Genizi (|1993|l and Hyvarinen et al. (|2001 Section 6.5). Therefore, the Mahalanobis- 



see 



decorrelated predictors <^(X) are nearest to the original standardized predictors Xgtd. 
Second, as P^^/-^ is positive definite J(X)^Xstd > for any Xgtd which implies that the 



9 



decorrelated and the standardized predictors are informative about each other also on a 
componentwise level (for example they must have the same sign). The correlation of the 
corresponding elements in Xgtd and S{X) is given by Corr((Xstd)!/ ^{X)i = {P^^'^)ii- 

4.5 Comparison of CAR scores and partial correlation 

Further insights into the interpretation of CAR scores can be gained by a comparison 
with partial correlation. 

The partial correlation between Y and a predictor X, is obtained by first removing the 
linear effect of the remaining d — 1 predictors X^i from both Y and X, and subsequently 
computing the correlation betw^een the respective remaining residuals. 

In contrast, with CAR scores the response Y is left unchanged whereas all d predictors 
are simultaneously orthogonalized, i.e. the linear effect of the other variables X^, on 
X, is removed simultaneously from all predictors ([Hyvarinen et al. 2001} Section 6.5). 



Subsequently, the CAR score is found as the correlation between the "residuals", i.e. the 
unchanged response and the decorrelated predictors. Thus, CAR scores may be viewed 
as a multivariate variant of the so-called part correlations. 

4.6 Variable importance and error decomposition 

The squared multiple correlation coefficient is the sum of the squared CAR scores, 
Q^ = cv^iv = E,=i coj. Consequently, the nominal mean squared prediction error in 
terms of CAR scores can be written 

E{{Y-Y*f) = crUl-w^w), 

which implies that (decorrelated) variables with small CAR scores contribute little to 
improve the prediction error or to reduce the unexplained variance. This suggests to 
define 

^^^^(X,) = cvj 

as a measure of variable importance. (p^^^{Xj) is always non-negative, reduces to pj for 
uncorrelated explanatory variables, and leads to the canonical decomposition 

Furthermore, it is easy to see that (p^^^{Xj) satisfies the orthogonal compatibility cri- 
terion demanded in 'Ge nizT| ( [1993 ). Interestingly, Genezi's own importance measure 
^•^(X^) can be understood as a weighted average (p^{Xj) = Ek=i{P^^^)%(p'^^^i^k) of 
squared CAR scores. 

In short, what we propose here is to first Mahalanobis-decorrelate the predictors to 
establish a canonical basis, and subsequently we define the importance of a variable Xj 
as the natural weight coj in this reference frame. 

10 



4.7 Grouped CAR score 

Due to the additivity of squared car scores it is straightforward to define a grouped CAR 
score for a set of variables as the sum of the individual squared CAR scores 



(^grouped — / ^ 
V geset 



a;|. 



As with the grouped CAT score ( |Zuber and Strimmer 2009| we also may add a sign in 
this definition. 

An estimate of the squared grouped CAR score is an example of a simple global test 



statistic that m ay be useful, e.g., in studying gene set enrichment (e.g. Ackermann and 
Strimmerl 2009 1). The null density of the empirical estimate Kg = 0=i cof for a set of size 



s is given by /(Kg ) = Beta(Rs; j, "^2^^ ) which for s = 1 reduces to the null distribution 
of the squared empirical CAR score, and for s = d equals the distribution of the squared 
empirical multiple correlation coefficient R^. 

Another related summary (used in particular in the next section) is the accumulated 
squared CAR score Q^ for the largest k predictors. Arranging the CAR scores in decreas- 
ing order of absolute magnitude coti\, . . ., cou-j with (x;^^ > . . . > coh-. this can be written 
as 

k 
'>2 _ V- ..,2 



n? = E 



CO 



or 



4.8 CAR scores and information criteria for model selection 

CAR scores define a canonical ordering of the explanatory variables. Thus, variable 
selection using CAR scores is a simple matter of thresholding (squared) CAR scores. 
Intriguingly, this provides a direct link to model selection procedures using information 
criteria such as AIC or BIC. 

Classical model selection can be put into the framework of penalized residual sum 
of squares (George} 2000 1 with 

RSSP''"^'' = RSS,c + Aic^f^ii, 

where k < dis the number of included predictors and ^^^jj an estimate of the variance 
of the residuals using the full model with all predictors included. The model selected 
as optimal minimizes RSS^*^"^ ^^^ , with the penalty parameter A fixed in advance. The 
choice of A corresponds to the choice of information criterion — see Tab.lllfor details. 

With RSS]^/ (nay) as empirical estimator of 1 — Q|, and R^ as estimate of O^, we 
rewrite the above as 

j^ggpenalized ^ Aic(l - R^) 



nay 



L 



^2 A(1-R2 



^(;) 
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Table 4: Threshold parameter A for some classical model selection procedures. 



Criterion Reference 



Penalty parameter 



AIC 
Cp 
BIC 
RIC 



Akaike 



(|1974| 



Mallowsi(!1973J| 
Schwarz (^1978 1 



Foster arid George ( 1994|| 



A 
A 
A 
A 



2 
2 

log(n) 
21og(d) 



r-,2 



^2 



This quantity decreases with k as long as a>tj.^ > ci)^ 



A(1-R2) 



Therefore, in terms of 

2 



',2 



CAR scores classical model selection is equivalent to thresholding a>^ at critical level co\ 
where predictors with a>? < a>^ are removed. If n is large or for a perfect fit (R^ = 1) all 
predictors are retained. 

As alternative to using a fixed cutoff we may also conduct model selection with 
an adaptive choice of threshold. One such approach is to remove null-variables by 



controlling false non-discovery rates (FNDR) as described in Ahdesmaki and Strimmer 



(20101). The required null-model for computing FNDR from observed CAR scores coj is 
the same as when using marginal correlations. Alternatively, an optimal threshold may 
be chosen, e.g., by minimizing cross-validation estimates of prediction error. 



4.9 Grouping property, antagonistic variables and oracle CAR score 

A favorable feature of the elastic net procedure for variable selection is the "grouping 
property" which enforces the simultaneous selection of highly correlated predictors 



(Zou and Hastie i2005j). Model selection using CAR scores also exhibits the grouping 
property because predictors that are highly correlated have nearly identical CAR scores. 
This can directly be seen from the definition u) = P^'^^bgtd of the CAR score. For two 
predictors Xi and X2 and correlation Corr(Xi,X2) = p a simple algebraic calculation 
shows that the difference between the two squared CAR scores equals 



U}^ 



co\ 



(&std)? 



{hst6)l) 



Therefore, the two squared CAR scores become identical with growing absolute value of 
the correlation between the variables. This grouping property is intrinsic to the CAR 
score itself and not a property of an estimator. 

In addition to the grouping property the CAR score also exhibits an important 
behavior with regard to antagonistic variables. If the regression coefficients of two 
variables have opposing signs and these variables are in addition positively correlated 
then the corresponding CAR scores decrease to zero. For example, with (bstd)2 = 

-(&std)i weget 

iOi = -W2 = (bstd)i \/l - P ■ 
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This implies that antagonistic positively correlated variables will be bottom ranked. A 
similar effect occurs for protagonistic variables that are negatively correlated, as with 

(bstd)i = (&std)2 we have 

Wi=W2 = (bstd)l\/l+P/ 

which decreases to zero for large negative correlation (i.e. for r — ?► —1). 

Further insight into the CAR score is obtained by considering an "oracle version" 
where it is known in advance which predictors are truly non-null. Specifically, we 
assume that the regression coefficients can be written as 



1, ( "std, non-null \ 

f^std = ^ J 



and that there is no correlation between null and non-null variables so that the correlation 
matrix P has block-diagonal structure 



^ non-null U 

PnuU 



The resulting oracle CAR score 



w = P^'^h 



std 







is exactly zero for the null variables. Therefore, asymptotically the null predictors will 
be identified by the CAR score with probability one as long as the employed estimator is 
consistent. 

5 Applications 

In this section we demonstrate variable selection by thresholding CAR scores in a 
simulation study and by analyzing experimental data. As detailed below, we considered 
large and small sample settings for both synthetic and real data. 

5.1 Software 



All analyzes were done using the R platform (IR Development Core Team 2010[|. A 



corresponding R package "care" implementing CAR estimation and CAR regression is 
available from the authors' web page (http: //www. strimmerlab. org/software/care/j 
and also from the CRAN archive (http : //cran. r-project . org/web/pac kages/care/] |. 
The code for the computer simulation is also available from our website. 

For comparison we fitted in our study lasso and elastic net regression models using 
the algorithms available in the R package "scout" (IWitten and Tibshirani[ 2009||. In 



addition, we employed the boosting algorithm for linear models as implemented in 
the R package "mboost" ([Hothorn and Biihlmann |2006|l, ordinary least squares with 



no variable selection (OLS), w^ith partial correlation ranking (PCOR) and with variable 
ranking by the Genizi method. 
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5.2 Simulation study 



In our simulations we broadly followed the setup employed in |Zou and Hastie |( |2005[ |, 
Witten and Tibshirani| ( |2009l l and |Wang etaL] ( pOTl) . 



Specifically, we considered the following scenarios: 

• Example 1: 8 variables with b = (3,1.5,0,0,2,0,0,0)^. The predictors exhibit 
autoregressive correlation with Corr(Xy, Xj^) = 0.5'J^'^'. 

• Example 2: As Example 1 but with Corr(Xy, X^) = 0.851^^*^1. 

• Example 3: 40 variables with b = (3, 3, 3, 3, 3, -2, -2, -2, -2, -2, 0, ... , 0)^. The 
correlation between all pairs of the first 10 variables is set to 0.9, and otherwise set 
toO. 

• Example 4: 40 variables with b = (3, 3, —2, 3, 3, —2, 0, . . . , 0)^. The pairwise correla- 
tions among the first three variables and among the second three variables equals 
0.9 and is otherwise set to 0. 

The intercept was set to a = in all scenarios. We generated samples x, by drawing 
from a multivariate normal distribution with unit variances, zero means and correlation 
structure P as indicated for each simulation scenario. To compute y, = b x, + £/ v^e 
sampled the error e, from a normal distribution with zero mean and standard deviation 
a (so that Var(e) = Var(y — Y*) = cr^). In Examples 1 and 2 the dimension is d = 8 
and the sample sizes considered were n = 50 and n = 100 to represent a large sample 
setting. In contrast, for Examples 3 and 4 the dimension is d = 40 and sample sizes were 
small (from n = 10 to n = 100). In order to vary the ratio of signal and noise variances 
we used different degrees of unexplained variance (c^ = 1 to c = 6). For fitting the 
regression models we employed a training data set of size n. The tuning parameter of 
each approach was optimized using an additional independent validation data set of the 
same size n. In the CAR, PCOR and Genizi approach the tuning parameter corresponds 
directly to the number of included variables, whereas for elastic net, lasso, and boosting 
the tuning parameter(s) corresponds to a regularization parameter. 

For each estimated set of regression coefficients b we computed the model error and 
the model size. All simulations were repeated 200 times, and the average relative model 
error as well as the median model size was reported. For estimating CAR scores and 
associated regression coefficients we used in the large sample cases (Examples 1 and 2) 
the empirical estimator and and otherwise (Examples 3 and 4) shrinkage estimates. 

5.3 Results from the simulation study 

The results are summarized in Tab. Island Tab. [6] In all investigated scenarios model 
selection by CAR scores is competitive with elastic net regression, and typically outper- 
forms the lasso and OLS with no variable selection and OLS with partial correlation. 
It is also in most cases distinctively better than boosting. Genizi's variable selection 
criterion also performs very well, with a similar performance to CAR scores in many 
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Table 5: Average relative model error (x 1000) and its standard deviation as well as the 
mean true and false positives (TP+FP) in alternating rows for Examples 1 and 2. These 
simulations represent large sample settings (d = 8 with n = 40 to n = 100). 





CAR* 


Elastic Net 


Lasso 


Boost 


OLS 


PCOR 


Genizi 


Example 

M = 50 

(7=1 


1 (true model size = 3) 












107 (5) 


135 (7) 


132 (6) 


390 (24) 


217 (8) 


107 (5) 


109 (6) 




3.0+1.2 


3.0+1.9 


3.0+1.8 


3.0+2.6 


3.0+5.0 


3.0+0.7 


3.0+1.3 


(7 = 3 


119 (7) 


130 (6) 


148 (6) 


151 (6) 


230 (9) 


153 (8) 


129 (7) 




3.0+1.3 


3.0+2.6 


3.0+1.9 


3.0+3.5 


3.0+5.0 


2.9+0.9 


3.0+1.3 


(7 = 6 


143 (6) 


127 (5) 


152 (6) 


149 (8) 


227 (8) 


163 (6) 


139 (6) 




2.5+1.2 


2.8+2.4 


2.6+2.0 


2.8+3.7 


3.0+5.0 


2.3+1.4 


2.5+1.1 


n = 100 
















(7 = 1 


53(3) 


64(3) 


59(3) 


219 (18) 


97(4) 


54(3) 


55(3) 




3.0+1.0 


3.0+1.9 


3.0+1.5 


3.0+2.4 


3.0+5.0 


3.0+0.8 


3.0+1.2 


(7 = 3 


55(3) 


58(2) 


59(3) 


78(3) 


99(3) 


59(3) 


56(4) 




3.0+1.2 


3.0+2.1 


3.0+1.9 


3.0+3.6 


3.0+5.0 


3.0+0.8 


3.0+1.0 


(7 = 6 


65(3) 


64(3) 


69(3) 


66(3) 


97(3) 


76(3) 


65(3) 




2.8+1.2 


2.9+2.4 


2.9+2.1 


3.0+3.7 


3.0+5.0 


2.6+1.3 


2.8+1.5 


Example 
n=50 

(7= 1 


2 (true model size = 3) 












110 (5) 


147 (7) 


134 (6) 


716 (55) 


230 (9) 


120 (8) 


130 (6) 




3.0+1.4 


3.0+2.4 


3.0+2.0 


3.0+3.1 


3.0+5.0 


3.0+0.9 


3.0+2.3 


(7 = 3 


127 (5) 


124 (5) 


139 (6) 


165 (7) 


220 (8) 


178 (9) 


158 (8) 




2.8+1.6 


3.0+3.0 


2.8+2.2 


2.8+3.5 


3.0+5.0 


2.4+1.6 


2.8+2.1 


(7 = 6 


121 (5) 


95(4) 


121 (6) 


110 (5) 


232 (9) 


165 (7) 


135 (5) 




2.2+1.5 


2.7+3.2 


2.2+1.9 


2.5+3.4 


3.0+5.0 


1.8+1.5 


2.2+1.6 


n = 100 
















(7 = 1 


49(3) 


67(3) 


61(3) 


325 (28) 


95(3) 


52(3) 


60(3) 




3.0+1.1 


3.0+2.2 


3.0+1.9 


3.0+3.0 


3.0+5.0 


3.0+1.0 


3.0+2.0 


(7 = 3 


62(3) 


63(3) 


64(3) 


83(4) 


101 (4) 


78(4) 


62(4) 




3.0+1.5 


3.0+2.7 


3.0+2.2 


3.0+3.3 


3.0+5.0 


2.8+1.2 


3.0+1.9 


(7 = 6 


64(3) 


53(2) 


59(2) 


54(2) 


100 (4) 


77(3) 


66(3) 




2.6+1.7 


2.9+3.1 


2.6+2.1 


2.7+3.3 


3.0+5.0 


2.0+1.4 


2.7+1.8 



* using empirical CAR estimator. 
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Table 6: Average relative model error (x 1000) and its standard deviation as well as the 
mean true and false positives (TP+FP) in alternating rows for Examples 3 and 4. These 
simulations represent small sample settings (d = 40 with n = 10 to n = 100). 

CAR* Elastic Net Lasso Boost OLS PCOR Genizi 

Example 3 (true model size = 10) 
M = 10 

a = 3 1482(44) 1501(45) 1905(75) 2203(66) — 

6.1+7.0 6.3+11.5 2.1+4.7 2.4+13.7 — 



M = 20 
















a = 3 


838 (30) 


950 (26) 


1041 (29) 


1421 (44) 


— 






n = 50 

C7 = 3 


6.4+2.7 


5.6+6.2 


2.5+4.2 


2.8+12.0 


— 






358 (11) 


571 (10) 


608 (8) 


805 (12) 


5032 (214) 


888 (27) 


364 (12) 




8.5+0.6 


5.2+2.9 


3.3+3.3 


4.2+13.0 


10.0+30.0 


2.5+2.2 


8.4+1.1 


M = 100 
















C7 = 3 


172 (6) 


488 (4) 


525 (6) 


569 (8) 


693 (14) 


406 (10) 


155 (5) 




9.5+0.7 


6.0+6.8 


5.9+10.8 


7.1+17.3 


10.0+30.0 


6.9+3.1 


9.6+0.6 


Example 
M = 10 
a = 6 


4 (true model 


size = 6) 












835 (24) 


1061 (34) 


1684 (60) 


1113 (39) 


— 






M = 20 

C7 = 6 


3.5+9.3 


4.5+20.2 


1.6+6.4 


1.5+9.8 


— 






527 (18) 


767 (25) 


925 (40) 


791 (22) 


— 






n = 50 

C7 = 6 


4.2+7.0 


4.4+13.2 


2.4+7.5 


2.0+9.4 


— 






200 (11) 


226 (9) 


293 (14) 


359 (11) 


4991 (176) 


1075 (67) 


204 (7) 




4.9+3.0 


4.3+4.7 


3.0+4.0 


3.3+12.9 


6.0+36.0 


2.8+5.0 


5.5+0.8 


M = 100 
















cr = 6 


87(4) 


107 (4) 


112 (3) 


168 (4) 


699 (16) 


232 (8) 


94(4) 




5.4+1.2 


4.5+2.9 


3.5+2.8 


3.8+12.2 


6.0+36.0 


4.6+1.7 


5.8+0.9 



* using shrinkage CAR estimator. 
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Shrinkage CAR 



Elastic Net 




~i — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 
b] b2 bs b4 b5 be by bg bg bio bn bi2 bi3 bu b,; 




~i — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 
bi b2 bg b4 bg bg bj bg bg big bi , b,2 bi3 bi4 bit 



Lasso 



Boost 




n — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 

a bi b2 bg b4 bg bg by bg bg bjg bn bi2 big bi4 b,5 




n — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 
a bi b2 bg b4 bg bg by bg bg big bi i bi2 big bi4 bit 



OLS 



^ 1 i " » . • 



4- - 4^ 4- T 






+ + -.--.- -^ 



~i — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 
a bi b2 bg b4 bg be by bg bg bio bii bi2 big bi4 bi5 



PCOR 



T 4 ' » o 



4- - 4^ 4 T 



^11 ; t;BBBgH?T 



+ -f -.- -^ ^ 



~1 — I — I — I — I — I — I — I — I — I — I — I — I — I — I — I — 

a bi bg bg b4 bg bg by bg bg big bi i bi2 big bl4 big 



Genizi 



I i 



+ - ^ 4^ T 



'ii^tjBBBgB^TI 



-9- -r T- ^ 



t -i 



~i — I — I — I — I — I — I — I — I — I — I — I — I — I — I — r 
a bi b2 bg b4 bg be by bg bg bio bii big big bl4 big 



Figure 1: Distribution of estimated regression coefficients for Example 3 with n = 50 
and (7 = 3. Coefficients for variables Xig to X40 are not shown but are similar to those of 
Xii to X15. The scale of the plots for OLS, PCOR and Genizi is different from that of the 
other four methods. yj 



Table 7: Population quantities for Example 1 with a = 3. 



Quantity 


Xl 


X2 


X3 


X4 


X5 


X6 


X7 


Xs 


b 


3 


1.5 








2 











bstd 


0.55 


0.27 








0.36 











PXY 


0.65 


0.36 








0.46 











PXY 


0.70 


0.59 


0.36 


0.32 


0.43 


0.22 


0.11 


0.05 


iO 


0.60 


0.40 


0.15 


0.13 


0.36 


0.10 


0.04 


0.02 


^CAR 


0.36 


0.16 


0.02 


0.02 


0.13 


0.01 


0.00 


0.00 



Numbers are rounded to two digits after the point. 



cases, except for Example 2. Tab.|5]and Tab. [6] also show the true and false positives for 
each method. The regression models selected by the CAR score approach often exhibt 
the largest number of true positives and the smallest number of false positives, which 
explains its effectiveness. 

Fig. IT] shows the distribution of the estimated regression coefficients for the investi- 
gated methods over the 200 repetitions for Example 3 with n = 50 and a = 3. This figure 
demonstrates that using CAR scores — unlike lasso, elastic net, and boosting — recovers 
the regression coefficients of variables Xg to Xio that have negative signs. Moreover, in 
this setting the CAR score regression coefficients have a much smaller variability than 
those obtained using the OLS-Genizi method. 

The simulations for Examples 1 and 2 represent cases where the null variables X3, X4, 
Xg, X7, and Xg are correlated with the non-null variables Xi, X2 and X5. In such a setting 
the variable importance (p^^^y-^P assigned by squared CAR scores to the null- variables 
is non-zero. For illustration, we list in Tab. iTlthe population quantities for Example 1 
with a = 3. The squared multiple correlation coefficients is Q^ = 0.70 and the ratio of 
signal variance to noise variance equals Q^/ (1 — Q-^) = 2.36. Standardized regression 
coefficients bgtdr as well as partial correlations Pxy are zero whenever the corresponding 
regression coefficient b vanishes. In contrast, marginal correlations PxY/ CAR scores a; 
and the variable importance (p^^^{Xj) are all non-zero even for bj = 0. This implies that 
for large sample size in the setting of Example 1 all variables (but in particular also X3, 
X4, and Xg) carry information about the response, albeit only weakly and indirectly for 
variables w^ith bj = 0. 

In the literature on variable importance the axiom of "proper exclusion" is frequently 
encountered, i.e. it is demanded that the share of Q^ allocated to a variable Xj with 
bj = is zero (Gromping 20071. The squared CAR scores violate this principle if null 



and non-null variables are correlated. However, in our view this violation makes perfect 
sense, as in this case the null variables are informative about Y and thus may be useful 
for prediction. Moreover, because of the existence of equivalence classes in graphical 
models one can construct an alternative regression model with the same fit to the data 
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Table 8: Ranking of variables and selected models (in bold type) using various variable 
selection approaches on the diabetes data. 



Rank 


PXY* 


PxY* 


CAR* 


Elastic Net 


Lasso 


Boost 


age 


10 


8 


8 


10 


— 


— 


sex 


4 


10 


7 


4 


5 


5 


bmi 


1 


1 


1 


1 


1 


1 


bp 


2 


3 


3 


3 


3 


3 


si 


5 


7 


9 


9 


6 


6 


s2 


6 


9 


10 


7 


— 


— 


s3 


9 


5 


4 


5 


4 


4 


s4 


7 


4 


5 


6 


— 


— 


s5 


3 


2 


2 


2 


2 


2 


s6 


8 


6 


6 


8 


7 


7 


Model size 


4 


9 


6 


10 


7 


7 



empirical estimates. 



that shows no correlation between null and non-null variables but which then necessarily 
includes additional variables. A related argument against proper exclusion is found in 



Gr6mping, ( ,2007j . 



5.4 Diabetes data 

Next w^e reanalyzed a low-dimensional benchmark data set on the disease progression 
of diabetes discussed in Efron et al. ( 2004[ |. There are d = 10 covariates, age (age), sex 
(sex), body mass index (bmi), blood pressure (bp) and six blood serum measurements 
(si, si, s2 s3 , s4, s5, s6), on which data were collected from n = 442 patients. As 
d < n we used empirical estimates of CAR scores and ordinary least squares regression 
coefficients in our analysis. The data were centered and standardized beforehand. 

A particular challenge of the diabetes data set is that it contains two variables (si 
and s2) that are highly positively correlated but behave in an antagonistic fashion. 
Specifically, their regression coefficients have the opposite signs so that in prediction 
the two variables cancel each other out. Fig. |2] shows all regression models that arise 
when covariates are added to the model in the order of decreasing variable importance 
given by (p^^^{Xj). As can be seen from this plot, the variables si and s2 are ranked 
least important and included only in the two last steps. 

For the empirical estimates the exact null distributions are available, therefore we 
also computed p-values for the estimated CAR scores, marginal correlations Pxy and 
partial correlations PxY/ arid selected those variables for inclusion with a p-value smaller 
than 0.05. In addition, we computed lasso, elastic net and boosting regression models. 
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CAR Regression Models for Diabetes Data 
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Figure 2: Estimates of regression coefficients for the diabetes study. Variables are 
included in the order of empirical squared CAR scores, and the corresponding regression 
coefficients are estimated by ordinary least squares. The antagonistic correlated variables 
si and s2 are included only in the last two steps. 
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Table 9: Cross-validation prediction errors resulting from regression models for the gene 
expression data. 



Model (Size) Prediction error 

Lasso (36) 0.4006 (0.0011) 

Elastic Net (85) 0.3417 (0.0068) 

CAR (36) * 0.3357 (0.0070) 

CAR (60) * 0.3049 (0.0064) 

CAR (85) * 0.2960 (0.0059) 
* shrinkage estimates. 



The results are summarized in Tab.js] All models include bmi, bp and s5 and thus 
agree that those three explanatory variables are most important for prediction of diabetes 
progression. Using marginal correlations and the elastic net both lead to large models of 
size 9 and 10, respectively, whereas the CAR feature selection in accordance with the 
simulation study results in a smaller model. The CAR model and the model determined 
by partial correlations are the only ones not including either of the variables si or s2. 

In addition, we also compared CAR models selected by the various penalized RSS 
approaches. Using the Cp / AIC rule on the empirical CAR scores results in 8 included 
variables, RIC leads to 7 variables, and BIC to the same 6 variables as in Tab.|8] 

5.5 Gene expression data 

Subsequently, we analyzed data from a gene-expression study investigating the relation 
of aging and gene-expression in the human frontal cortex (Lu et al. 2004 j). Specifically, 



the age n = 30 patients was recorded, ranging from 26 to 106 years, and the expression 
oi d = 12625 genes was measured by microarray technology. In our analysis we used 
the age as metric response Y and the genes as explanatory variables X. Thus, our aim 
was to find genes that help to predict the age of the patient. 

In preprocessing we removed genes with negative values and log-transformed the 
expression values of the remaining d = 11 940 genes. We centered and standardized the 
data and computed empirical marginal correlations. Subsequently, based on marginal 
correlations we filtered out all genes with local false non-discovery rates (FNDR) smaller 
than 0.2, following Ahde smaki and Strimmer| ( j2010j ). Thus, in this prescreening step we 
retained the d = 403 variables with local false-discovery rates smaller than 0.8. 

On this 30 x 403 data matrix we fitted regression models using shrinkage CAR, lasso, 
and elastic net. The optimal tuning parameters were selected by minimizing prediction 
error estimated by 5-fold cross-validation with 100 repeats. Cross-validation included 
model selection as integrative step, e.g., CAR scores were recomputed in each repetition 
in order to avoid downward bias. A summary of the results is found in Tab. |9| The 
prediction error of the elastic net regression model is substantially smaller than that 
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CAR Models for the Gene Expression Data 
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Figure 3: Comparison of CV prediction errors of CAR regression models of various sizes 
for the gene expression data. 



of the lasso model, at the cost of 49 additionally included covariates. The regression 
model suggested by the CAR approach for the same model sizes improves over both 
models. As can be seen from Fig.lslthe optimal CAR regression model has a size of about 
60 predictors. The inclusion of additional explanatory variables does not substantially 
improve prediction accuracy. 

6 Conclusion 

We have proposed correlation-adjusted marginal correlations u), or CAR scores, as a 
means of assigning variable importance to individual predictors and to perform variable 
selection. This approach is based on simultaneous orthogonalization of the covariables 
by Mahalanobis-decorrelation and subsequently estimating the remaining correlation 
between the response and the sphered predictors. 
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We have shown that CAR scores not only simplify the regression equations but 
more importantly result in a canonical ordering of variables that provides the basis for a 
simple yet highly effective procedure for variable selection. Because of the orthogonal 
compatibility of squared CAR scores they can also be used to assign variable importance 
to groups of predictors. In simulations and by analyzing experimental data we have 
shown that CAR score regression is competitive in terms of prediction and model error 
with regression approaches such as elastic net, lasso or boosting. 

Since writing of this paper in 2010 we have now^ also become aware of the "tilted 
correlation" approach to variable selection (Cho and Fryzlewicz 2011[|. The tilted 



correlation — though not identical to the CAR score — has the same objective, namely 
to provide a measure of the contribution of each covariable in predicting the response 
while taking account of the correlation among explanatory variables. 

In summary, as exemplified in our analysis we suggest the following strategy for 
analyzing high-dimensional data, using CAR scores for continuous and CAT scores for 
categorical response: 

1. Prescreen predictor variables using marginal correlations (or f-scores) with an 



adaptive threshold determined, e.g., by controlling FNDR ( Ahdesmaki and Strim 
|merl[20T0l >. 



2. Rank the remaining variables by their squared CAR (or CAT) scores. 

3. If desired, group variables and compute grouped CAR (or CAT) scores. 

Currently, we are studying algorithmic improvements to enable shrinkage estimation of 
CAT and CAR scores even for very large numbers of predictors and correlation matrices, 
which may render unnecessary in many cases the prescreening step above. 
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